Using NU_admission_data.csv create two separate plots derived from the single plot depicted in undergraduate-admissions-statistics.pdf — this visual and data has been collected from https://www.adminplan.northwestern.edu/ir/data-book/. They overlaid two plots on one another by using dual y-axes.
Create two separate plots that display the same information instead of trying to put it all in one single plot — stack them using patchwork or cowplot.
There is one major error they make with the bars in their graphic. Explain what it is.
Solution
They stacked the bar charts incorrectly. The three components of each bar are just stacked on top of one another. For example, the 2001 bar has a total of 13,987 applications, but when you look at the y-axis the number of applications exceeds 20,000. Instead of using shaded bars so that each bar height is equal to that year’s total number of applications, the number of admitted students and matriculants is added on to the total number of applications. This falsely inflates the height of the bars.
Which approach do you find communicates the information better, their single dual y-axes plot or the two separate plot approach? Why?
Solution
I think the two plot approach communicates this information better. The single plot would be fine if the y-axes were the same or could be directly mapped on to each other, but these axes are different which can make it difficult to discern which data is mapped onto which axis. Additionally, the frequency of number lables clutter the visualization and some labels overlap which impacts readability. In the visualization’s current state, it’s best to use two separate plots.
Hints:
Form 4 datasets (helps you get organized, but not entirely necessary):
1 that has bar chart data,
1 that has bar chart label data,
1 that has line chart data, and
1 that has line chart labels
Consider using ggsave() to save the image with a fixed size so it is easier to pick font sizes.